Goto

Collaborating Authors

 information transfer




Lamina

Neural Information Processing Systems

How different cell types in a neural system contribute to signal processing by the entire circuit is a prime question in neuroscience.


Transfer learning for scalar-on-function regression via control variates

Yang, Yuping, Zhou, Zhiyang

arXiv.org Machine Learning

Transfer learning (TL) has emerged as a powerful tool for improving estimation and prediction performance by leveraging information from related datasets. In this paper, we repurpose the control-variates (CVS) method for TL in the context of scalar-on-function regression. Our proposed framework relies exclusively on dataset-specific summary statistics, avoiding the need to pool subject-level data and thus remaining applicable in privacy-restricted or decentralized settings. We establish theoretical connections among several existing TL strategies and derive convergence rates for our CVS-based proposals. These rates explicitly account for the typically overlooked smoothing error and reveal how the similarity among covariance functions across datasets influences convergence behavior. Numerical studies support the theoretical findings and demonstrate that the proposed methods achieve competitive estimation and prediction performance compared with existing alternatives.


On the Practical Estimation and Interpretation of Rényi Transfer Entropy

Tabachová, Zlata, Jizba, Petr, Lavička, Hynek, Paluš, Milan

arXiv.org Machine Learning

Rényi transfer entropy (RTE) is a generalization of classical transfer entropy that replaces Shannon's entropy with Rényi's information measure. This, in turn, introduces a new tunable parameter $α$, which accounts for sensitivity to low- or high-probability events. Although RTE shows strong potential for analyzing causal relations in complex, non-Gaussian systems, its practical use is limited, primarily due to challenges related to its accurate estimation and interpretation. These difficulties are especially pronounced when working with finite, high-dimensional, or heterogeneous datasets. In this paper, we systematically study the performance of a k-nearest neighbor estimator for both Rényi entropy (RE) and RTE using various synthetic data sets with clear cause-and-effect relationships inherent to their construction. We test the estimator across a broad range of parameters, including sample size, dimensionality, memory length, and Rényi order $α$. In particular, we apply the estimator to a set of simulated processes with increasing structural complexity, ranging from linear dynamics to nonlinear systems with multi-source couplings. To address interpretational challenges arising from potentially negative RE and RTE values, we introduce three reliability conditions and formulate practical guidelines for tuning the estimator parameters. We show that when the reliability conditions are met and the parameters are calibrated accordingly, the resulting effective RTE estimates accurately capture directional information flow across a broad range of scenarios. Results obtained show that the explanatory power of RTE depends sensitively on the choice of the Rényi parameter $α$. This highlights the usefulness of the RTE framework for identifying the drivers of extreme behavior in complex systems.


Focus of Attention Improves Information Transfer in Visual Features

Neural Information Processing Systems

Unsupervised learning from continuous visual streams is a challenging problem that cannot be naturally and efficiently managed in the classic batch-mode setting of computation. The information stream must be carefully processed accordingly to an appropriate spatio-temporal distribution of the visual data, while most approaches of learning commonly assume uniform probability density. In this paper we focus on unsupervised learning for transferring visual information in a truly online setting by using a computational model that is inspired to the principle of least action in physics. The maximization of the mutual information is carried out by a temporal process which yields online estimation of the entropy terms. The model, which is based on second-order differential equations, maximizes the information transfer from the input to a discrete space of symbols related to the visual features of the input, whose computation is supported by hidden neurons. In order to better structure the input probability distribution, we use a human-like focus of attention model that, coherently with the information maximization model, is also based on second-order differential equations. We provide experimental results to support the theory by showing that the spatio-temporal filtering induced by the focus of attention allows the system to globally transfer more information from the input stream over the focused areas and, in some contexts, over the whole frames with respect to the unfiltered case that yields uniform probability distributions.




All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens

Mamidanna, Siddarth, Rai, Daking, Yao, Ziyu, Zhou, Yilun

arXiv.org Artificial Intelligence

Large language models (LLMs) demonstrate proficiency across numerous computational tasks, yet their inner workings remain unclear. In theory, the combination of causal self-attention and multilayer perceptron layers allows every token to access and compute information based on all preceding tokens. In practice, to what extent are such operations present? In this paper, on mental math tasks (i.e., direct math calculation via next-token prediction without explicit reasoning), we investigate this question in three steps: inhibiting input-specific token computations in the initial layers, restricting the routes of information transfer across token positions in the next few layers, and forcing all computation to happen at the last token in the remaining layers. With two proposed techniques, Context-Aware Mean Ablation (CAMA) and Attention-Based Peeking (ABP), we identify an All-for-One subgraph (AF1) with high accuracy on a wide variety of mental math tasks, where meaningful computation occurs very late (in terms of layer depth) and only at the last token, which receives information of other tokens in few specific middle layers. Experiments on a variety of models and arithmetic expressions show that this subgraph is sufficient and necessary for high model performance, transfers across different models, and works on a variety of input styles. Ablations on different CAMA and ABP alternatives reveal their unique advantages over other methods, which may be of independent interest.


Focus of Attention Improves Information Transfer in Visual Features

Neural Information Processing Systems

The temporal trajectories of the variables of the learning problem are modeled by the so called 4th order Cognitive Action Laws (CALs) that come from stationarity conditions of a functional, as it happens for generalized coordinates in classical mechanics.